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Reversible jump Markov chain Monte Carlo (RJMCMC) extends ordinary 
MCMC methods for use in Bayesian multimodel inference. We show that 
RJMCMC can be implemented as Gibbs sampling with alternating updates 
of a model indicator and a vector-valued "palette" of parameters denoted 
if}. Like an artist uses the palette to mix dabs of color for specific needs, we 
create model-specific parameters from the set available in if). This descrip- 
tion not only removes some of the mystery of RJMCMC, but also provides a 
basis for fitting models one at a time using ordinary MCMC and computing 
model weights or Bayes factors by post-processing the Monte Carlo output. 
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We illustrate our procedure using several examples. 
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1 Introduction 

A natural Bayesian approach to problems of multimodel inference is to com- 
pute posterior model probabilities, or equivalently Bayes factors, given priors 
and data. Bayes factors involve marginal likelihoods and can be difficult to 
calculate. Here we address the problem of estimating posterior model proba- 
bilities, and provide a representation of reversible jump Markov chain Monte 
Carlo (RJMCMC) that allows us to use MCMC output obtained fitting mod- 
els one at a time. 

Techniques have been proposed for computing Bayes factors using MCMC 
output from independent chains generated for different models (for example, 



Chib 



19951 ) or by using a search over the joint space of model indicators 
M G M. and model parameters 9j G Qj given by M. x rije.M ®r Either 
approach is difficult in practice and it is common for m odel selection to be 



based instead on a deviance information criterion (DIC) (jSpiegelhalter et al 



20021 ). However, there is no theoretical justification for using DIC to produce 



model weights or Bayes factors. 
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Reversible jump Markov chain Monte Carlo is an extension of ordinary 
MCMC, for multimodel inference. In this broader context, the posterior dis- 
tribution under investigation describes parameters for a collection of models, 
rather than a single model; furthermore the posterior distribution describes 
model uncertainty through weights on a categorical variable we call Model. 

A key step in implementing RJMCMC is the specification of bijecti ons de 



scrib i ng relationships betw een the parameters of various models (e.g., 



Green 



1995 



Gelman et al. 



2004 ). RJMCMC is usually described in terms of (*) 



such bijections where K is the number of models in the model set At. 



Link and Barker! (120101 ) outlined an alternative formulation of RJMCMC as 
simple Gibbs sampling, alternating between updating a palette of parameters 
ip, which is of the same dimension for all models, and the categorical variable 
Model. There are K bijections, one relating each model's parameters to the 
palette ip; the (^) bijections typically described are obtained from these. 
Careful construction of the palette and K bijections allows RJMCMC to be 
carried out using samples from model-specific posteriors, ob tained one model 



at a t ime. Here we illustrate this approach and extend 



Link and Barker 



(120101 ) by showing that moves between models can be written so that they 



involve a direct draw from a known categorical distribution with all models in 
the sample space. This formulation obviates the need for use of a Metropolis- 
Hastings step that only allows pair-wise comparison of models and can be 
easier to implement than RJMCMC in its usual incarnations. 
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2 A description of RJMCMC 

Suppose that we wish to evaluate the relative support provided by data y 
to models Mk, k = 1, 2, . . . , K, these models being fully known except for 
parameter vectors 0^ for which we have specified priors. 

RJMCMC can be expressed as simple Gibbs sampling, with draws alter- 
nating between the categorical variable Model and a universal vector- valued 
parameter if). We compare i/> to an artist's palette: as the artist combines 
colors on her palette to produce colors needed for specific applications, so 
components of i/) are combined to produce model-specific parameters 6^ k \ 
The important feature of RJMCMC is that the entire palette if) is updated 
at each step of the Gibbs sampler (rather than simply those components 
relating to th e present model) . 



Following iLink and Barker! (|2010[ ) the palette of parameters if) is a vector 
of dimension d greater than or equal to the dimension of the most com- 
plex model in the model set. Parameter vector 0^ can be recovered from 
the palette if) by means of a known (invertible) mapping gk(tp) = 0^ = 
(0(*), «(*))'. Vector 

u (k) j g i rre i evan t to model M k , serving only to match the 
dimension of 0^ and so that Qk{-) can be defined as a bijection. Thus if 
model M.2 has parameter space of dimension 7, and d = 10, vector will 
have dimension 3. 

Note that the bijections typically required for RJMCMC are induced 
by our K bijections between the palette and model-specific parameter spaces: 
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for models Mj and M k we have g jk (@ (j) ) = 9k° gj 1 ^)) = g k {^) = 0«. 

We must specify a prior for ij> and in doing so accommodate model specific 
priors [0^\Model = M k ], which for simplicity we write as [6^\M k ]. We have 

[0( fe )|M fc ] = [0 {k \u^\M k ] 

= [d^\M k ] [u^ k) \6 {k \M k }. 

All that is needed is a specification of [u^\0^ k \ M k \\ given that vS k ^ has no 
role in inference, it will be convenient to assume it is conditionally indepen- 
dent of 0( fe ), so that [uW|0( fc ),M fc ] = [u^\M k \. The specific choice does 
not matter, except for tuning the RJMCMC algorithm. From [0^|Mjt] 
we obtain |/j/>|Mfc] using the change of variables theorem in terms of a prior 
/ fc (0( fc )) = [6»( fc ) )M W|M fc ]. The prior on V is then 

M=X>|M fc ][M fc ] 



where 

dg k {il>) 



\M k } = f k (g k (i>)) 



(1) 



dtp 

Under this formulation, Gibbs sampling consists of cyclical sampling of 
full conditional distributions, alternating between draws from [if)\M k ,y] to 
update t/?, and from [{Mi, . . . , M K }\ip, y] to update Model. 
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Updating ip 

To draw from the full conditional [tj>\M k , y] it suffices to draw from [9^\M k , y] 
and [w^|M fe ] and then to apply the inverse transformation g' k 1 (®^) to ob- 
tain a draw for if). 

The draw from [0 (fe) |M fc , y] can either be made directly, if the distribution 
is of convenient form, or by simulation. Another possibility, often an attrac- 
tive alternative, is to take a random draw from the stored MCMC output of 
an earlier analysis of model M k . 

Updating the model 

The full-conditional for Model is categorical with probabilities: 



[y\ 


1>,M k ][1>\ 


M k ][M k ] 









for k — 1, . . . , K. If we are willing to calculate all of these probabilities, we 
can update Model by a direct draw from this full-conditional distribution. 
Chain means of model indicators l{Model = M k ) converge to the full condi- 
tional model probabilities, but greater efficiency is available by using chain 
means of the full conditional model probabilities, which also converge to the 
posterior model probabilities. 

As an alternative, we can update model indicators by a Metropolis- 
Hastings step if the model candidate generator only allows limited transitions, 
for example, to a near neighbor in a graphical model sense. The advantage 
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of this approach is that we compute a smaller set of categorical probabilities, 
corresponding to the neighborhood set. In this case we must compute pos- 
terior model probabilities by chain means of indicators I (Model = M k ). 

Expressing RJMCMC as simple Gibbs sampling provides the key innova- 
tion of our formulation: it allows us to fit models one at a time using ordinary 
MCMC and then compute model weights or Bayes factors by post-processing 
the Monte Carlo output. Thus, we have a simple 2-stage procedure that can 
be used for computing model probabilities: 

Stage 1: Produce samples of [ip\y,M k ] for each k. 

Begin by sampling [6^\y, M k \. This can be accomplished by running an 
MCMC sampler for M fc , processing it in the usual way, discarding any ini- 
tial burn-in iterations, and storing the results. (In cases where the posterior 
distribution for 0^ is of a known and easily sampled form, we do so.) For 
each sampled value 0^ k \ independently sample an auxiliary variable from 
[u (fc) |M fc ] and calculate if) = g k l ((0^ k \u^) r ). The collection of sampled val- 
ues if) is a sample of [if)\y, M k ]. 

Stage 2: Post-process the model specific outputs. 

Posterior model probabilities can be computed in one of two ways. The first 
method is based on generating a Markov chain of the categorical variable 
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Model that can be used as a posterior sample from [Model\y\. The second 
method is based on generating a Markov chain of between-model transition 
probabilities that can be used to estimate the between-model transition ma- 
trix. The steady state marginal distribution of this matrix corresponds to 
the posterior model probabilities. 

Method 1 A posterior sample Modeft\ Modeft\ . . . , Modeft\ . . . can 
be generated as follows: 

(a) Initialize Model, say with Modeft^ = Mi. 

(b) Iterate from j = 1 to some large number J, and at each step: 

(i) If Modeft^ = Mk, draw a value rJ)V' from the stored sample of 
[tj)\y, Mk) from Stage 1. 

(ii) Compute ir^ = Pr(Model = M k \^ j \y), for each k. This calcu- 
lation requires the Jacobian of the transformation 0^ = gk(if>) 
as in eq. (pQ), evaluated at ijj^K 

(iii) Sample Modeft +1 ^ from a categorical distribution with sample 
space {Mi, M 2 , . . . , M K } and probability vector 7r^' = 



The relative frequency with which Modeft^ = Mk approximates the poste- 
rior model probability for A better estimate (Rao-Blackwellized) is the 
chain mean of values n^K 
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Method 2 A further improvement on this approximation can be made 
by marginalizing at stage 1, obviating the need for the construction of a 
Markov chain of model indicators in the second stage. For each model we 
can compute Pr(Mfc|?/>, M^) forming a Markov chain of transition probabili- 
ties from model h to model k (h,k e 1, . . . , K). These can then be averaged 
to form an approximation to the stochastic matrix governing model-to-model 
transitions. Given an estimate of this transition matrix we can obtain corre- 
sponding estimates of the posterior model probabilities as the limiting distri- 
bution obtainable by normalizing t he left eigenv ector of the transition matrix 



associated with the eigenvalue 1.0 ( iSeber 



2008|). 



3 Examples 



3.1 Radiata pine data 



Carlin and Chibl Jl995). 



data taken from 



Williams 



Ian and Carlinl (120011 ) . and many others analyze 



(119591 ) . The response variable y{ is the maximum 
compressive strength parallel to the grain for 42 radiata pine boards. Two 
explanatory variables are considered: the first is the specimen's density, Xi, 
and the second is the specimen's density having adjusted for resin content, 
z%. Resin increases the density of boards without increasing their compressive 



9 



strength. ICarlin and Chibl (119951 ) considered two models: 



Model 1: jji = a + /3(xi - x) + e i; ~ iV(0, o^) 



and 



Model 2: Vi = 1 + 5{z t - z) + e h e, ~ JV(0, <rf). 



In both cases the errors e are assumed iid among observations, conditional 
on the para meters. 



As priors. ICarlin and Chibl fll995f ) used iV((3000, 185)', diag(10 6 , 10 4 )) pri- 
ors on (a, 0)' and (7, 5)', and inverse gamma priors on a 2 and a 2 , both having 
mean and standard deviation equal to 300 2 . This quirky choice of priors was 
made to be vague but with expectations corresponding roughly to the pa- 
rameter estimates obtained by fitting the model by least squares 



We fitted each of these models independently using BUGS (ILunn et al. 



20001 ) and the above priors, running three chains of 60,000 each with distinct 
starting values. Discarding the first 10,000 of each chain as a burn-in left us 
with a posterior sample of 150,000 for O^ 1 ' and O^K We coded a reversible 
jump algorithm in which gi(i/>) = (a, /3, a 2 )' and (^(VO = (7, 8, 0% )'• In this 
case, [i/>|Mi] = [i/>|ilf 2 ], and the model update is based on the relative values 
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of the likelihoods weighted by the model priors Pr(Mfc): 



Pr(il4|j/,-0) 



Pr(M fc )e 2 *3 



2ibo J2i=l(yi Mifc) 2 



-1 y^42 / _ 12 -1 -^42 

Pr(Mi)e 2 *3 ^=1* + Pr(M 2 )e 2 ^ 



where 



V'i + - x) k 
l[>l + l[>2(Zi - z) k 



Following lHan and Carlinl ( 1200 ll ) we assigned model priors of Pr(Mi 
0.9995 and Pr(M2) = 0.0005 to ensure that the two models were visited in 
roughly equal proportion. Starting at model 1 or model 2 the chain for the 
posterior model probability converges rapidly (Figured]). After running the 
two chains for 200,000 iterations and discarding the first 100,000 as a burn- 
in, our estimate of the posterior model probability was 0.709 corresponding 
to a Bayes factor BF 2 \ of 4870. These are in close agreemen t with the exact 
values of Yi(M 2 \y) = 0.70865 and BF 2 i = 4862 reported by 
( 1200 lk 



Han and Carlin 



Figure [T] about here 

Using our second method, we sampled 200,000 values of ifi from each 
chain h, and for the ith sample we calculated Pr(Mfc|^>W, M^) (k = 1,2). 
Averaging across i we obtain an estimated transition matrix of: 



' 0.6003 0.3997 ^ 



0.1651 0.8349 
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with steady-state marginal distribution of (0.2924, 0.7076)', corresponding to 
BF 21 = 4838. 



3.2 Trout return rates 



Link and Barker! (120061 ) report an analysis based on fitting logistic regression 
models to the return rates for brown trout expressed in terms of sex Si 
and length Li effects. Modeling the return indicator yi ~ Bern(pi) they 
considered five models: 



1- rji = A) 



2. rji = 0o + PiSi 



3. r)i = p + (3 2 Li 



4. rji = p + PtSi + (3 2 Li 



5. rji = A) + PiSi + (3 2 Li + (3 3 SiLi 



wh ere r\j = logitfa;). 



Link and Barker! (120061 ) used the following priors on parameters: 



\Pk\V,M k 



N&V- 1 ) k = l 

N(0 1 (2V)- 1 ) k = 2 

N(0, (2V)' 1 ) k = 3 

N(0, (3V)- 1 ) k = 4 

k N(0, (W)- 1 ) k = 5 
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where V has a Ga(3.29, 7.80) prior distribution. This choice was motivated 
by the observation that if logit(p) ~ N(0, V' 1 ) and V ~ r(3.29, 7.80), then 
the marginal distribution of p is very nearly uniform on [0,1]. With Si and 
Li having been standardized, these choices of priors ensure that the prior on 
e Vi /(l — e Vi ) = pi is approximately U(0, 1) for Si = ±1 and ± 1. 

Palette and bijections 

Each element of if) is directly associated with either an element of the beta 
vector or with a supplemental variable u (Tabled]): The parameter V is part 









Model 






1 


2 


3 4 


5 


V>1 


A> 


A) 


/So A) 


A) 


V>2 






Ui A 


A 




u 2 




& 02 


& 


V>4 


«3 


«2 


U 2 Ui 


#12 



Table 1: Association between elements of and elements of (3k, specific 
parameters for model and supplemental variables Uk used in model 
for matching the parameter dimension to t/j. 

of the prior specification and is common to all models so we chose to leave it 
out of the palette specification, although this is not necessary. Updates for 
V were stored when each model was fitted. 

For a particular model, the priors on the supplemental variables were the 
same as the priors used for the (3 coefficients in that model, and in each case 
the Jacobian of the transformation from 0^ to ?/> is an identity matrix of 
dimension 5. 
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As an example, Model 1 (constant only) has parameter vector 0^ = B 
with supplemental variables = (ui,U2,us)' ■ Thus 



0(D 



( 8 ^ 



Ml 
\ /' 



leading to: 



[^|M 1? V] = /i(^i(V)) 



<9#i(» 



9-0 



= Nfa; 0, V- 1 ) x 7V(^ 2 ; 0, V" 1 ) x N(^ 3 ; 0, V' 1 ) x 7V(^ 4 ; 0, V^ 1 ) 
x Ga(V; 3.29, 7.80) x |I 5 | . 

Repeating this process for each model we obtain the model-specific priors: 



[i/>\M k ,V] = UNtyi-AirikV)- 1 ) x Ga(V; 3.29, 7.80) x |I 5 | 



i=i 



where n& is the dimension of the vector f3^ k \ 

For generating a chain of model indicators, we used a direct draw from 
the full conditional: 



Pi(M k \1>,V) = 



Pr(M fc ) nti y/^e=W ngpfOj^ 

ELi pr(M fc ) nti nS 1 - Pf) 
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where logit(p|") = 



To estimate the Bayes factors we first fitted the five models, in each 
case combining results from three different chains of length 500,000 after 
discarding a burn-in. We then generated five chains using our Gibbs sampler 
for the model ind i cator , starting each chain with a different model. Following 



Link and Barker 



(120061 ) we first tuned the Gibbs sampler to visit each model 



in roughly equal proportion. Mixi ng of the model i ndicat ors appears rapid 



(Figure [2]) and agreement with the 



Link and Barker! (120061 ) estimates is good 



after combining the results from the second half of 200,000 iterations of the 
five chains (Table [2]). 



j BF 



Pr(M,-|y) 



1 1 (1) 0.893 (0.894) 

2 31.3 (31.7) 0.029 (0.028) 

3 12.3 (12.4) 0.073 (0.072) 

4 274.6 (281.7) 0.003 (0.003) 

5 383.4 ( 390.1) 0.002 (0.002) 



Table 2: Estimates of Bayes factors BFy for comparing models 1 and j 
and estimates of posterior model probabilities under cons tant prior model 
proba bilities Pr(Mj) = 0.2. Corresponding estimates from iLink and Barker 
(120061 ) are given in parentheses. 



Figure [2] about here 



For method two we drew a sample of 10,000 values for if) from each chain 
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leading to an estimate of the transition matrix of: 





0.8172 


0.0870 


0.0847 


0.0088 


0.0024 


\ 




0.0858 


0.8086 


0.0107 


0.0755 


0.0195 






0.0854 


0.0102 


0.8233 


0.0759 


0.0052 






0.0081 


0.0749 


0.0781 


0.7884 


0.0504 




V 


0.0026 


0.0176 


0.0057 


0.0498 


0.9244 


/ 



with steady-state marginal distribution 

( 0.1986 ^ 

0.1975 

0.2016 • 

0.1989 
\ 0.2034 j 

3.3 Simple binomial 

In both of the above examples, the bijections from ij) to are simple 1-1 
mappings with the Jacobian of the transformation an identity matrix. Now 
consider an example where Yj ~ B(Ni,pi) and we have observations y\ = 8, 
rii = 20, y 2 = 16, and n 2 = 30. What is the evidence for pi ^ p 2 against 
Pi = T>2 = 7T? To compute an appropriate Bayes factor we fit two models: 

1. Model 1: (pi,p 2 ) with independent Be(a p , (3 P ) priors 

1 Only 10,000 samples were drawn due to the large RAM requirements on the desktop. 
This number can easily be increased by writing batches of such draws to the hard-drive. 
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2. Model 2: pi = p 2 = tc with a Be(a n ,(3 n ) prior on ir. 

For model 1, we assign i/> = ij>i,P2)' ■ It seems natural in moving from model 1 
to model 2 that the average ip = (^1 + ^2)/^ should provide a good candidate 
for n. Thus, our bijections can be written as: 



Model 1: I 2 x ip 



( \ 

Pi 

\ P2 J 



and 



Model 2: 



V 



1/2 1/2 
1 



7T 



X 



^1 



where I 2 is a 2 x 2 identity matrix and u an appropriate supplemental variable 
Our Gibbs sampler then proceeds as follows: 

1. Within models the full conditional distributions for model-specific pa- 
rameters are of known form since we have conditional (on the model) 
conjugacy: 

- Under Model 1 we sample p\ ~ Be(8 + a p , 12 + j3 p ) and p 2 ~ 
Be(16 + a p , 14 + (5 P ) and then compute ip = (pi,^)'- 

- Under Model 2 we sample tt ~ Be(24 + a n , 26 + /3 n ) and u ~ 
Be(a u , j3 u ) and then set ip\ = 2n — u and ip2 = u. 

2. Between models we set: 
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- Pr(M 1 |-) ex Pr(M0x^r Qp (l-^i) 12+ ^^2 P (l-^) 14+ ^l^ ie( o,i)l^ e (o,i) 

- Pr(M 3 |.) oc Pr(M 2 ) x ^ 24 +^(l - ^) 26 +M^ (0)1) l^ 6(0i i) x \ 



where 1e denotes the indicator of the event E and the proportional- 
ity constant is the same for each model. We then sample the model 
indicator by a direct draw from a categorical distribution with sample 
space {1,2} and parameter vector (vri,7r 2 )' where 



To fit the models we used as prior parameters a p = (3 P = a n = (3 n = 1 (i.e., 
independent U(0, 1) priors). We also set a u = (3 U = 15 so that draws for 
u were similar to draws for Convergence of the chain for the posterior 
probability of model 2 was rapid (Figure [3]). Combining results from 100,000 
iterations of the two chains we obtained B21 = 1.92 and Pr(M 2 ) = 0.658. 
For both models the marginal distribution of the data is straight-forward to 
compute and the exact solution for the posterior model probabilty is 0.6580. 



Using method two with a sample of 100,000 values of i\) from each chain 
we estimate the transition matrix as: 



Pr(M,.|-) 



Pr(Mi|-) + Pr(M 2 |-)' 



Figure about here 



0.4318 0.5682 



\ 



V 



0.2951 0.7049 



/ 
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with steady-state marginal distribution (0.34230.6577)'. 



4 Discussion 

Bayesian inference offers an appealing framework for multimodel inference 
but the difficulties of computing Bayes factors, or equivalently posterior 
model probabilities, can be a barrier to implementation. Being able to in- 
dependently fit models and then post-process them using RJMCMC as we 
have described here offers a partial solution to the problem. 

An issue often raised in objection to Bayesian multimodel inference (BMI) 
based on Bayes factors is that one must ass ume that the true model is i n 



2006 



20101), 



the model set. As we have argued elsewhere (ILink and Barker 
this is a red-herring - conditioning on a model set is no less innocuous than 
conditioning on a model as must be done for any form of statistical inference. 
Conditioning on models and model sets is done for operational convenience 
- we no more believe that truth is in our model set than we believe that the 
model yi ~ N(fi, a 2 ) can ever be a true and complete representation of any 
set of data. 

A more serious issue with BMI is priors on parameters; it is well-known 
that Bayes factors are sensitive to choice of priors, particularly vague priors. 
Our view is that priors should be chosen so that common features of interest 
in each model have the same prior uncertainty associated with them. An 
attempt at such an approach is illustrated by the West Coast trout example 
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in which priors were constructed based on the logit of the return probability 
for trout that had typical values of the cova riates. Such an approac h we have 



20081 ). 



previously referred to as "nonpreferential" (ILink and Barker! 

Choice of efficient bijections for moving between models requires some 
thought. Although our approach simplifies this problem to one of choosing 
K such bijections, choices must be made. Features that are of interest and 
common across models can be exploited in choosing bijections as well as 
providing a basis for constructing non-preferential priors. Generalized linear 
model formulations such as represented in our trout example offers one means 
for constructing bijections. One area of possible fruitful investigation in 



this context is that our palette representation o 
conne cted to the use of importance link functions 



RJMCMC appears to be 



MacEachern and Peruggia 



( 120001 ). There may be benefits from considering this connection from the 
point of view of determining transformations g(i/>) in our representation that 
lead to more efficient Monte Carlo estimation of posterior model probabilities. 

Our description of RJMCMC as simple Gibbs sampling with a direct draw 
from a known distribution for model probabilities is a further useful simpli- 
fication. Moves can be made to any model in the set M. using samples from 
the full-conditional distribution for model indicators; we are not restricted to 
moves between pairs of models. Methods that involve moves to neighbours 
have been used to automate search across very high dimensional model space. 
We are skeptical about the value of such algorithms as they induce a partic- 
ular prior on parameters. Such default constructions may lead to priors that 



20 



are prejudicial in which case posterior model probabilities would be more a 
reflection of these prior prejudices than data-informed posterior weighting. 
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Figure 1: Index plot of the cumulative posterior probability p = Pr(M = 2) 
starting with model 1 (red) or model (2). The horizontal black line corre- 
sponds to the exact result. 
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Figure 2: Index plot of the cumulative posterior model probabilities. Each 
plot represents a different model probability and the different colored chains 
represent different starting values. The black line corresponds to the value 
targeted during tuning. 
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Figure 3: Index plot of the cumulative posterior probability p = Pr(M = 2) 
starting with model 1 (red) or model (2). The horizontal black line corre- 
sponds to the exact result. 
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